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Reply to Office Action of April 9, 2003 

REMARKS 

This Response is submitted in reply to the Office Action dated April 9, 2003. Claims 1- 
11 are pending in the patent application. Claims 1-11 have been canceled. New claims 12-29 
have been added. 

The Office Action requires that an Information Disclosure Statement be submitted for 
references cited in the specification. Additionally, the Office Action objects to the drawings, 
title, abstract, specification and claims 1, 5-8 and 10-11 based on informalities; rejects claims 1-3 
and 10-11 under 35 U.S.C. § 102(b); and rejects claims 4, 5 and 9 under 35 U.S.C. §103(a). 
Applicants respectfully submit, for the reasons set forth below, that the objections and rejections 
have been overcome. Accordingly, Applicants respectfully request reconsideration of the 
patentability of claims 12-29. 

At the outset, the Patent Office requires that an Information Disclosure Statement be 
submitted for the references cited on page 17 of the specification. Applicants have submitted an 
Information Disclosure Statement with this response which includes the references cited on page 
17. 

The Patent Office objects to the drawings because the drawings include the phrase "voice 
recognition" instead of the phrase "speech recognition." The Patent Office suggests that the 
phrase "speech recognition" is the correct phrase based on the invention disclosed by the 
specification. Applicants agree with the Patent Office and have amended the drawings to reflect 
the speech recognition system where applicable. Additionally, the Patent Office objects to the 
drawings because reference numeral 109 in Fig. 1 and reference numeral SI 3 in Fig. 10 are not 
included in the specification. Applicants have amended the specification to include these 
reference numerals. No new matter has been added. Replacement drawing sheets including these 
changes have been submitted herewith to replace the original drawing sheets submitted with the 
patent application. 

The Patent Office states that the title is not descriptive. Applicants have amended the 
title to be more descriptive of the invention. Additionally, the Patent Office objects to the 
Abstract as to informalities. Specifically, the Patent Office contends that the phrase "voice 
recognizing" is subject to misinterpretation. The Patent Office suggests that this phrase should 
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be changed to "speech recognition." Applicants have amended the Abstract to correct these 
informalities. 

The Patent Office also states that the specification includes several minor errors and/or 
informalities which require correction. Applicants have reviewed the specification and have 
amended the specification to remove the minor errors cited by the Patent Office and any other 
informalities discovered by the Applicants. 

Claims 1-3 and 10-1 1 were rejected under 35 U.S.C. §102(b) as being anticipated by U.S. 
Patent No. 5,685,000 to Cox, Jr. {"Cox Jr."). Additionally, claims 4 and 5 were rejected under 
35 U.S.C. §103(a) as being unpatentable over Cox Jr. in view of U.S. Patent No. 5,916,024 A2 
Von Kohorn (''Von Kohorn "). Moreover, claim 9 was rejected under 35 U.S.C. § 103(a) as being 
unpatentable over Cox Jr. in view of U.S. Patent No. 6,477,509 Bl to Hammons et al. 
("Hammons et al. "). As stated above, claims 1-11 have been cancelled. Therefore, the rejection 
of claims 1-11 are now moot. 

The Office Action states that claims 6, 7 and 8 were objected to as being dependent upon 
a rejected base claim, but would be allowable if rewritten in independent form including all of 
the limitations of the base claim and any intervening claims. New claim 12 includes the elements 
of independent claim 1 and objected claim 6. Therefore, new claim 12, as well as new claims 13- 
17, which depend from new claim 12, are now in condition for allowance. 

New claim 18 includes the elements of independent claim 1 and objected claim 7. 
Therefore, new claim 18, as well as new claims 19-23, which depend from new claim 18, are in 
condition for allowance. 

New claim 24 includes the elements of independent claim 1 and objected claim 8. 
Therefore, new claim 24, as well as new claims 25-29, which depend from new claim 24, are in 
condition for allowance. 

Applicants respectfully submit that new claims 12-29 are allowable and non-obvious over 
the art of record. For the foregoing reasons, Applicants respectfully solicit an early allowance of 
these claims. 
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A Petition for a One-Month Extension to Respond to the Office Action is submitted 
herewith. A check in the amount of $110.00 is submitted herein to cover the cost of the one- 
month extension. If any other fees are due in connection with this application as a whole, the 
Patent Office is authorized to deduct such fees from deposit account 02-1818. If such a 
withdrawal is made, please indicate the attorney docket number (112857-246) on the account 
statement. 

Respectfully submitted, 
BELL, BOYD & LLOYD LLC 
BY 

Christopher S. Hermanson 

Reg. No. 48,244 

P.O. Box 1135 

Chicago, Illinois 60690-1135 

Phone: (312) 807-4225 

Dated: July 22. 2003 
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Amendments to the Specification: 

Please replace the paragraph beginning at page 1, line 1 with the following rewritten 
specification. A marked up version and a clean version of the amended specification are 
included in this section. 

MARKED-UP VERSION OF THE SUBSTITUTE SPECIFICATION 

COLLECTING INFORMATION USING SPEECH INFORMATION PROCESSING 

APPARATUS, INFORMATION 

PROCESSING METHOD, AND STORAGE MEDIUM 
BACKGROUND OF THE INVENTION 

Field of the Invention 

The present invention relates to an information processing apparatus, an information 
processing method, and a storage medium. More particularly, the present invention relates to an 
information processing apparatus and method which can easily collect user information 
indicating, e.g., interests and tastes of users, as well as a storage medium storing a program 
required for executing the information processing. 
Description of the Related Art 

DESCRIPTION OF THE RELATED ART 

For example, WWW (World Wide Web) servers constructed on the Internet, which has 
recently become more prevalent with rapidly expanding popularity, provide a great deal amount 
of information. It is difficult for users to search for desired information from among such a great 
deal amount of information by themselves. Web pages called search engines are therefore 
presented. 

Web pages serving as search engines are provided by, e.g., INFOSEEK® and YAHOO® . 
When searching information provided by WWW servers, users perform such 
predetermined operations as accessing web pages serving as search engines, and entering 
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keywords regarding information to be searched for. As a result, the users can obtain search 
results e f from the search engines. 

However, even when utilizing a search engine to search for information, various 
categories of information containing an entered keyword are provided as search results. Users 
are therefore required to seek desired items from among those various categories of information 
by themselves, and this G ee king work which is troublesome. 

One conceivable solution is to prepare a profile representing user information regarding, 
e.g., interests and tastes of a user A in advance, and to present those items of information among 
search results of a search engine, which ar e in match with-the profiler-te of the user. 

In such a conventional method, however, a user is required to manipulate a keyboard, a 
mouse or the like to enter answers for various questions in order to acquire user information 
necessary for preparing a user profile. The conventional method therefore imposes a large 
burden on the user. 

SUMMARY OF THE INVENTION 

In view of the state of the art set forth above, it is an object of the present invention to 
easily collect user information regardingrerg^ interests and tastes of users. 

To achieve the above object, an information processing apparatus according to the present 
invention comprises a veiee speech recognizing unit for recognizing the speech voices of a user; a 
dialog sentence creating unit for creating a dialog sentence to exchange a dialog with the user 
based on a result of the veie espeech recognition performed by the veiee -speech recognizing unit; 
and a collecting unit for collecting the user information based on the veiee- speech recognition 
result. 

The information processing apparatus may further comprise a storage unit for storing the 
user information. 

The dialog sentence creating unit may output the dialog sentence in the form of a text or 
synthesized sounds. 

The collecting unit may collect the user information based on an appearance frequency of 
a word contained in the voice speech recognition result. 
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Also, the collecting unit may collect the user information based on a broader term of a 
word contained in the veiee speech recognition result. 

Further, the collecting unit may count the number of times of spe e ch e s on th ethg same 
topic is mentioned or included in the speech based on the veiee speech recognition result, and 
may collect the user information based on athe counted vahienumber. 

Still further, the collecting unit may eeun ttrack a time of sp ee ch e s on time interval when 
the same topic is mentioned or included in the speech based on the veiee speech recognition 
result, and may collect the user information based on a count e d valu e the time interval . 

Still further, the collecting unit may count the number of times of appearanc e s of the 
same topic is mentioned or included in the speech based on the veie espeech recognition result, 
and may collect the user information based on athe counted vahi enumber . 

The user information may be information indicating interests or tastes of the user. 

An information processing method according to the present invention comprises a 
veiee speech recognizing step of recognizing veiees the speech of a user; a dialog sentence 
creating step of creating a dialog sentence to exchange a dialog with the user based on a result of 
the veiee speech recognition performed by the veiee speech recognizing step; and a collecting step 
of collecting the user information based on the veie espeech recognition result. 

A storage medium according to the present invention stores a program comprising a 
veie espeech recognizing step of recognizing voic e s o ft he speech of a user; a dialog sentence 
creating step of creating a dialog sentence to exchange a dialog with the user based on a result of 
the veiee- speech recognition performed by the voice recognizing step; and a collecting step of 
collecting the user information based on the veiee speech recognition result. 

With the information processing apparatus, the information processing method, and the 
storage medium according to the present invention, veiees the speech of a user afeis recognized 
and a dialog sentence for exchanging a dialog with the user is created based on a result of the 
veiee speech recognition. Also, user information is collected based on the veie espeech 
recognition result. 
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BRIEF DESCRIPTION OF THE DRAWINGS 

Fig. 1 is a block diagram showing an example of the configuration of a computer as one 
embodiment of the present inventions 

Fig. 2 is a block diagram of one embodiment of an interactive user-profile collecting 
system whose function is realized by the computer shown in Fig. If. 

Fig. 3 is a block diagram showing an example of the configuration of a voice recognizing 

unitj. 

Fig. 4 is a block diagram showing an example of the configuration of a language 
processing unitf. 

Fig. 5 is a block diagram showing an example of the configuration of a dialog managing 

unitf. 

Fig. 6 is a block diagram showing an example of the configuration of a user information 
managing unitf 

Figs. 7A and 7B are tables showing examples of profile management information and a 
user profile, respectivelys 

Fig. 8 is a block diagram showing an example of the configuration of a response 
generating unitf. 

Fig. 9 is a flowchart showing a first embodiment of profile collection processings 
Fig. 10 is a flowchart showing a second embodiment of the profile collection processings 
Fig. 1 1 is a flowchart showing a third embodiment of the profile collection processings 
Fig. 12 is a flowchart showing a fourth embodiment of the profile collection processings 

tulu 

Fig. 13 is a flowchart showing a fifth embodiment of the profile collection processing. 
DESCRIPTION OF THE PREFERRED EMBODIMENTS 

Preferred embodiments of the present invention will be described below with reference to 
the drawings. 

Fig. 1 shows an example of the configuration of a computer as one embodiment of the 
present invention. 
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The computer shown in Fig. 1 installs therein a program for executing a sequence of 
processing steps described later. 

The program can be stored in a hard disk 105 or a ROM (Read Only Memory) 103 
beforehand, which are incorporated as storage mediums in the computer. 

As an alternative, the program may be temporarily or permanently stored (recorded) in a 
removable storage medium 1 1 1 such as a floppy disk, CD-ROM (Compact Disc Read Only 
Memory), MO (Magneto-optical) disk, DVD (Digital Versatile Disc), magnetic disk, and a 
semiconductor memory. Such a removable storage medium 1 1 1 can be provided in the form of 
so-called package software. 

A manner of installing the program in the computer is not limited to the above-described 
one using the removable storage medium 111. The program may be transferred from a download 
site to the computer over the air via an artificial satellite for digital satellite broadcasting, or may 
be transferred to the computer through wire via a network such as the Internet. In any case, the 
computer receives the transferred program by a communicating unit 108 and installs the program 
in the internal hard disk 105. 

The computer incorporates a CPU (Central Processing Unit) 102 therein. An input/output 
interface 110 is connected to the CPU 102 via a bus 101. When a command is inputted through 
the input/output interface 110 upon the user manipulating an input unit 107 constituted by a 
keyboard, a mouse or the like, the CPU 102 runs the program stored in the ROM 103 in 
accordance with the command. Also, the CPU 102 loads, into a RAM (Random Access 
Memory) 104, the program stored in the hard disk 105, or the program transferred via a satellite 
or a network and installed in the hard disk 105 after being received by the communicating unit 
108, or the program installed in the hard disk 105 after being read out of the removable storage 
medium 1 1 1 inserted in a drive 44O109, and then runs the loaded program. By so running the 
program, the CPU 102 executes processing in accordance with flowcharts described later, or 
processing in accordance with block diagrams described later. After that, the CPU 102 outputs a 
result of the processing from an output unit 106 constituted by an LCD (Liquid Crystal Display), 
a speaker or the like through the input/output interface 110, or transmits it from the 
communicating unit 108 through the input/output interface 110, or stores it in the hard disk 105, 
as required. 
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In this embodiment, a program for operating the computer to function as an interactive 
user-profile collecting system, described later, is installed. When the CPU 102 runs that installed 
program, the computer functions as an interactive user-profile collecting system shown in Fig. 2. 

Fig. 2 shows an example of the configuration of one embodiment of the interactive 
user-profile collecting system whose function is realized by the computer shown in Fig. 1 with 
the CPU 102 running the relevant program. 

When a voice dialog is performed between the computer and a user, the interactive 
user-profile collecting system collects user information regarding, e.g., interests and tastes of the 
user a based on veiees speech , etc. spoken by the user in the dialog, and stores (records) the 
collected user information as a user profile. 

More specifically, veiees speech or words spoken by the user are inputted teby a 
veie espeech recognizing unit 1. The voice r e cognizin p speech unit 1 recognizes the input 
veiee sspeech and outputs a text (phoneme information), which is obtained as a result of the 
veie espeech recognition, to a language processing unit 2. Also, the veiee -speech recognizing 
unit 1 extracts rhythm information of the veiee sspeech spoken by the user, and outputs the 
extracted rhythm information to a dialog managing unit 3. 

The language processing unit 2 carries out language processing of the veie espeech 
recognition result outputted from the veiee speech recognizing unit 1 , and outputs information 
regarding words, syntax and meaning contained in the veiee speech recognition result, as a result 
of the language processing, to the dialog managing unit 3. 

The dialog managing unit 3 performs dialog management for generating a sentence for 
use in exchanging a dialog with the user (i.e., a dialog sentence), and extracts the user 
information. More specifically, the dialog managing unit 3 produces response generation 
information, which instructs generation of a response sentence, etc. in reply to the user 
veiees speech recognized by the veiee speech recognizing unit 1, based on, for example, the 
language processing result outputted from the language processing unit 2, and outputs the 
response generation information to a response generating unit 5. Also, the dialog managing unit 
3 collects the user information indicating interests and tastes of the user based on, for example, 
the language processing result outputted from the language processing unit 2 and the phoneme 
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information outputted from the veie espeech recognizing unit 1 , and supplies the collected user 
information to a user information management unit 4. 

The user information management unit 4 stores, as a user profile, the user information 
supplied from the dialog managing unit 3. 

In accordance with the response generation information supplied from the dialog 
managing unit 3, the response generating unit 5 generates a response sentence, etc. in reply to the 
user voices and outputs it in the form of synthesized sounds. 

Thus, in the interactive user-profile collecting system having the above-described 
configuration, veiees speech spoken by a user are recognized by the veiee speech recognizing unit 
1, and a result of the veiee -speech recognition is supplied to the language processing unit 2. The 
language processing unit 2 interprets the meaning (contents) of the veiee speech recognition result 
from the veiee -speech recognizing unit 1, and supplies a result of the language processing to the 
dialog managing unit 3. Based on an output of the language processing unit 2, the dialog 
managing unit 3 produces response generation information for generating a response sentence, 
etc. in reply to the user voic e s is speech and, and then supplies the response generation 
information to the response generating unit 5. In accordance with the response generation 
information from the dialog managing unit 3, the response generating unit 5 generates the 
response sentence, etc. and outputs it in the form of synthesized sounds. 

When the user speaks in reply to the response sentence, etc. outputted from the response 
generating unit 5, a speech uttered by the user is subjected to veiee- speech recognition in-by_the 
veiee- speech recognizing unit 1. Subsequently, the above-described processing is repeated, 
whereby the dialog between the user and the computer progresses. 

In parallel to the dialog progressing performed in such a way, the dialog managing unit 3 
collects user information indicating interests and tastes of the user based on the outputs of both 
the veiee- speech recognizing unit 1 and the language processing unit 2, and supplies the collected 
user information to the user information management unit 4. The user information management 
unit 4 then stores, as a user profile, the user information supplied from the dialog managing unit 
3. 

Accordingly, in the interactive user-profile collecting system of Fig. 2, a dialog is 
performed between the user and the computer, and the user information is collected during the 
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dialog without consciousness of the user. As a result, the user information can be easily 
collected and stored (without causing the user to feel any burden). 

Fig. 3 shows an example of the functional configuration of the voice recognizing unit 1 in 

Fig. 2. 

A speech by the user is inputted to a microphone 1 1 that converts the speech into a voice 
signal in the form of an electrical signal. The voice signal is supplied to an A/D 
(Analog-to-Digital) converter 12. The A/D converter 12 carries out sampling and quantization of 
the voic e signal in the form of an analog signal supplied from the microphone 1 1 for conversion 
into veiee -speech data in the form of a digital signal. The veiee -speech data is supplied to a 
feature extracting unit 13. 

For each appropriate frame of the veiee -speech data supplied from the A/D converter 12, 
the feature extracting unit 13 extracts feature parameters such as a spectrum, a linear prediction 
coefficient, a cepstrum coefficient, a linear spectrum pair and an MFCC (Mel Frequency 
Cepstrum Coefficient), and then supplies the extracted feature parameters to a matching unit 14. 

Based on the feature parameters supplied from the feature extracting unit 13, the 
matching unit 14 recognizes voicos speech inputted to the microphone 11 (i.e., input 
veiees speech ) while referring to an acoustic model database 15, a dictionary database 16 and a 
grammar database 17 as required. 

More specifically, the acoustic model database 15 stores acoustic models representing 
acoustic features such as individual phonemes and syllables in the language relating to the voic e s 
ie~be -speech recognized speech . For example, an HMM (Hidden Markov Model) can be used 
herein-as the acoustic model. The dictionary database 16 stores a word dictionary describing 
information about pronunciations of individual words to be recognized. The grammar database 
17 stores grammar rules defining how the individual words registered in the word dictionary of 
the dictionary database 16 are linked with each other. For example, rules based on Context Free 
Grammar (CFG), HPSG (Head-driven Phrase Structure Grammar), statistical word linkage 
probability (N-gram), etc. can be herein used as the grammar rules. 

The matching unit 14 constructs an acoustic model of each word (i.e., a word model ) by 
connecting relevant ones of the acoustic models stored in the acoustic model database 15 with 
each other while referring to the word dictionary stored in the dictionary database 16. Further, 
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the matching unit 14 connects several word models with each other while referring to the 
grammar rules stored in the grammar database 17, and recognizes the veiees- speech inputted to 
the microphone 1 1 with the HMM method, for example, based on the feature parameters by 
using the word models thus connected. 

Phoneme information obtained as a result of the veiee -speech recognition executed by the 
matching unit 14 is outputted to the language processing unit 2 in the form of, e.g., a text. 

Also, the matching unit 14 extracts rhythm information of the veiees -speech inputted to 
the microphone 1 1 and outputs the extracted rhythm information to the dialog managing unit 3. 
More specifically, by way of example, the matching unit 14 counts the mora number in the result 
of the veiee -speech recognition obtained as described above, calculates the mora number per 
frame, etc., and outputs a calculation result as a user speaking speed to the dialog managing unit 
3. 

Fig. 4 shows an example of the functional configuration of the language processing unit 2 
in Fig. 2. 

The text (rhythm information ) outputted as the veiee -speech recognition result from the 
veiee- speech recognizing unit 1 (the matching unit 14 in Fig. 3) is inputted to a text analyzer 21. 
The text analyzer 21 analyzes the input text while referring to a dictionary database 23 and an 
analysis grammar database 24. 

More specifically, the dictionary database 23 stores a word dictionary describing a 
notation of each word, part-of-speech information required to apply the grammar for analyzing 
the text, etc. The analysis grammar database 24 stores analysis grammar rules defining 
restrictions, etc. with respect to word linkage based on the information of each word described in 
the word dictionary of the dictionary database 23. Then, based on the word dictionary and the 
analysis grammar rules, the text analyzer 21 analyzes morphemes of the text (voice recognition 
result) inputted to it, and outputs an analysis result to a syntax/meaning analyzer 22. 

Based on the output of the text analyzer 21, the syntax/meaning analyzer 22 performs 
syntax analysis of the veiee- speech recognition result from the veiee -speech recognizing unit 1 
and interpretation of the meaning thereof while referring to a dictionary database 25 and an 
analysis grammar database 26. Further, the syntax/meaning analyzer 22 adds, to the veiee- speech 
recognition result from the voice recognizing unit 1, information representing the concept and 
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meaning of each of the words contained in the veiee- speech recognition result, and then outputs 
an addition result, as a language processing result, to the dialog managing unit 3. 

The dictionary database 25 and the analysis grammar database 26 store similar contents of 
information as those stored in the dictionary database 23 and the analysis grammar database 24, 
respectively. Furthermore, the syntax/meaning analyzer 22 performs syntax analysis and 
interpretation of the meaning by using the normal grammar, Context Free Grammar (CFG), 
HPSG, and statistical word linkage probability (N-gram), etc. 

Fig. 5 shows an example of the functional configuration of the dialog managing unit 3 in 

Fig. 2. 

The speaking speed as the rhythm information outputted from the veiee — speech 
recognizing unit 1 (the matching unit 14 in Fig. 3) and the processing result from the language 
processing unit 2 (the syntax/meaning analyzer 22 in Fig. 4) (i.e., the language processing result) 
are inputted to a dialog processor 31. Based on the language processing result from the language 
processing unit 2, the dialog processor 31 produces response generation information for 
instructing generation of a response sentence, etc. in reply to the veiee- speech recognition result 
from the veiee- speech recognizing unit 1 while referring to a scenario database 34 and a 
knowledge database 35. 

More specifically, the scenario database 34 stores a scenario describing, e.g., a dialog 
pattern between the computer and the user for each task (topic), and the dialog processor 31 
produces the response generation information in accordance with the scenario. 

For an object-oriented task such as presetting a VCR to record a program, the following 
scenario is stored, by way of example, in the scenario database 34: 

(action(Question(date, start time,_end-time,_channel))) 
(date ???) #data 
(start time ???) #start time 
(end time ???) #end time 
(channel???) #channel (1) 
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According to the above scenario (1), when the language processing result from the 
language processing unit 2 represents a request for presetting a VCR to record a program, the 
dialog processor 31 produces the response generation information that instructs generation of 
sentences for questioning the date to record the program, the start time to record the program, the 
end time to end the recording, and the channel of the program to be recorded, in the order named. 

Also, as a scenario to perform a non-objective dialog (so-called chat), which is 
represented by a dialog program such as ELIZA™ (for ELIZA™, see, e.g., Weizenbaum, Joseph, 
"ELIZA™ - a computer program for the study of natural language communication between man 
and machine.", Communication of the ACM 9, 1966 and James Allen, "Natural Language 
Understanding", The Benjamin/Cunning Publishing Company Inc. PP. 6-9), the following one is 
stored, by way of example, in the scenario database 34: 

If X exists then speak (Y) 

#X: keyword, Y: response sentence 

(money What do you want?) #(x Y) 

(want to eat Are you hungry?) (2 ) 

According to the above scenario (2), if a keyword "money" is included in the language 
processing result from the language processing unit 2, the dialog processor 31 produces the 
response generation information for instructing generation of a sentence to ask a question "What 
do you want?". Also, if a keyword "want to eat" is included in the language processing result 
from the language processing unit 2, the dialog processor 31 produces the response generation 
information for instructing generation of a sentence to ask a question "Are you hungry?". 

The knowledge database 35 stores general knowledge necessary for performing a dialog 
between the user and the computer. More specifically, the knowledge database 35 stores, as 
general knowledge, such information that, when the language processing result from the 
language processing unit 2 represents that the user has uttered a greeting, the information 
instructs the dialog processor 31 to issue a greeting in reply to the user greeting. Also, the 
knowledge database 35 stores, as general knowledge, topics and so on to be used in a chat. 
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Further, the knowledge database 35 stores, as general knowledge, information about 
mann e rs matters for inquiring user information regarding interests and tastes of the user (such as 
items to be inquired, intervals (time) of inquiries, and the number of times of inquiries). 

Thus, the dialog processor 31 produces the response generation information while 
referring to the above-described knowledge in the knowledge database 35 as needed. 

In addition, the dialog processor 31 executes profile collection processing to collect user 
information regarding interests and tastes of the user based on the speaking speed as the rhythm 
information outputted from the veiee- speech recognizing unit 1, the language processing result 
from the language processing unit 2, an output of an extractor 32, a dialog history stored in a 
dialog history storage 33, profile registry information stored in the user information management 
unit 4, etc., and to supply, to the user information management unit 4, profile control information 
for instructing the collected user information to be reflected in a user profile. 

In other words, the dialog processor 31 recognizes interests and tastes of the user based 
on, e.g., words contained in the language processing result from the language processing unit 2 
(or words contained in the veiee -speech recognizing result from the veiee- speech recognizing 
unit 1) and broader terms of those words. Then, in accordance with a recognition result, the 
dialog processor 31 produces the profile control information and supplies it to the user 
information management unit 4. 

Further, based on the speaking speed obtained as the rhythm information from the voic e 
speech recognizing unit 1, the language processing result from the language processing unit 2 and 
so on, the dialog processor 31 determines whether the topic in a dialog between the user and the 
computer has shifted (changed), thereby recognizing the number of times of speech e s on the 
same topic is mentioned , a time of the sp ee ch e s topic was mentioned, etc. Then, in accordance 
with a recognition result, the dialog processor 31 produces the profile control information and 
supplies it to the user information management unit 4. 

In response to a request from the dialog processor 31, the extractor 32 extracts those ones 
among the words contained in the language processing result from the language processing unit 
2, which are available as the information regarding interests and tastes of the user, and supplies 
the extracted words to the dialog processor 31. Also, the extractor 32 recognizes broader terms 
of the words contained in the language processing result from the language processing unit 2 by 
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referring to a concept information database 36, and supplies the recognized broader terms to the 
dialog processor 3 1 . 

The concept information database 36 stores, e.g., a thesaurus expressing word concepts in 
a hierarchy structure, and the extractor 32 retrieves which concept on the thesaurus each word 
belongs to, thereby recognizing a broader term of the word. 

The dialog history storage 33 stores a history of the dialog between the user and the 
computer (i.e., a dialog history) in response to a request from the dialog processor 31. Herein, 
the dialog history includes not only the language processing result received by the dialog 
processor 31 from the language processing unit 2 and the response generation information 
produced depending on the language processing result, but also the number of times of r e spons e s 
mad e on the same topic is mentioned , the time of issuanc e of each response was mentioned , the 
time of each utterance by the user, etc. as required. These items of information as- form the 
dialog history afe- which is supplied from the dialog processor 31 to the dialog history storage 33. 

Fig. 6 shows an example of the functional configuration of the user information 
management unit 4 in Fig. 2. 

The profile control information outputted from the dialog managing unit 3 (the dialog 
processor 31 in Fig. 5) is supplied to a recording/reproducing unit 41. In accordance with the 
profile control information, the recording/reproducing unit 41 records the user information 
regarding interests and tastes of the user in the user profile of the profile database 42. 

Also, in response to a request from the dialog managing unit 3 (the dialog processor 31 in 
Fig. 5), the recording/reproducing unit 41 reads profile management information recorded in the 
profile database 42 and supplies it to the dialog managing unit 3 (the dialog processor 31 in Fig. 
5). 

The profile database 42 stores profile management information and a user profile shown 
respectively, by way of example, in Figs. 7A and 7B. 

More specifically, Fig. 7A shows the profile management information. In an example of 
Fig. 7A, the profile management information is made up of an identifier, interest information, 
and a threshold. The identifier is to identify the interest information, and has a unique value for 
each item of the interest information. The interest information represents categories (fields) 
indicating interests and tastes of the user. "Movie", "music", "car", "book" and "travel" are 



14 



Appl. No. 09/765,962 

Reply to Office Action of April 9, 2003 

registered as items of the interest information in the example of Fig. 7 A. The threshold is set for 
each item of the interest information, and has a registered value to be compared with the number 
of times, described later, recorded in the user profile. 

Fig. 7B shows the user profile. In an example of Fig. 7B, the user profile is made up of 
an identifier, interest information, the number of times, and an interest flag. The identifier and 
the interest information are the same as those of the profile management information. The 
number of times represents a value obtained by estimating how many times the user has shown 
an interest on each category indicated by the interest information. The interest flag is a flag of, 
e.g., one bit. Only the interest flags corresponding to the items of the interest information, which 
indicate the categories in match with the interests and tastes of the user, are set to "1", for 
example, and the other interest flags are set to "0". With the user profile of Fig. 7B, therefore, 
the categories indicated by the interest information, for which the interest flags are set to "1", are 
in match with the interests and tastes of the user. 

Fig. 8 shows an example of the functional configuration of the response generating unit 5 
in Fig. 2. 

The response generation information is supplied to a response sentence generator 51 from 
the dialog managing unit 3 (the dialog processor 31 in Fig. 5). The response sentence generator 
51 generates a response sentence in the form of a text corresponding to the response generation 
information while referring to a template database 55, a generation grammar database 56 and a 
dictionary database 57 as required, and then supplies the generated response sentence to a text 
analyzer 52. 

More specifically, the template database 55 stores templates representing examples of the 
response sentence. The generation grammar database 56 stores grammar rules such as 
conjugation rules of words necessary for generating the response sentence and information about 
restrictions in the word sequence. The dictionary database 57 stores a word dictionary describing 
information of each word, such as a part of speech, pronunciation and an accent. The response 
sentence generator 51 generates a response sentence corresponding to the response generation 
information from the dialog managing unit 3 while referring to the templates, the grammar rules 
and the word dictionary as required, and then supplies the generated response sentence to the text 
analyzer 52. 
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Note that the method of generating a sentence is not limited to one employing templates, 
but may be practiced using, for example, a method based on the case structures. 

The text analyzer 52 analyzes a text as the response sentence from the response sentence 
generator 51 while referring to the dictionary database 57 and an analysis grammar database 58. 

More specifically, the dictionary database 57 stores the word dictionary described above. 
The analysis grammar database 58 stores analysis grammar rules such as restrictions on word 
linkage for the words contained in the word dictionary of the dictionary database 57. Based on 
the word dictionary and the analysis grammar rules, the text analyzer 52 performs analysis, such 
as morpheme analysis and syntax analysis, of the response sentence from the response sentence 
generator 51, and extracts information necessary for ruled veiee -speech synthesis to be executed 
in a subsequent rule synthesizer 53. The information necessary for the ruled veiee -speech 
synthesis includes, e.g., information for controlling pose positions, accents and intonations, other 
rhythm information, and phoneme information such as pronunciations of individual words. 

The information obtained by the text analyzer 52 is supplied to the rule synthesizer 53. 
The rule synthesizer 53 creates veiee -speech data (digital data) in the form of synthesized sounds 
corresponding to the response sentence, which has been generated in the response sentence 
generator 51, by using a sound fragment database 59. 

More specifically, the sound fragment database 59 stores sound fragment data in the form 
of, e.g., CV (Consonant, Vowel), VCV, and CVC Based on the information from the text 
analyzer 52, the rule synthesizer 53 connects required sound fragments data to each other, and 
then adds poses, accents and intonations in proper positions, thereby creating voice data in the 
form of synthesized sounds corresponding to the response sentence which has been generated in 
the response sentence generator 51. 

The created veiee -speech data is supplied to a D/A (Digital-to-Analog) converter 54 for 
conversion into a veiee- speech signal as an analog signal. The veiee- speech signal is supplied to 
a sp e aker speech (not shown), which outputs the synthesized sounds corresponding to the 
response sentence generated in the response sentence generator 51. 

The profile collection processing executed by the dialog managing unit 3 in Fig. 5 for 
collecting user information regarding interests and tastes of the user and reflecting the user 
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information in a user profile (Fig. 7B) will be described below with reference to flowcharts of 
Figs 9 through 13. 

When a user utters a speech or speaks and veiees- the words spoken by the user are 
recognized by the veiee -speech recognizing unit 1 and subjected to language processing by the 
language processing unit 2, the speaking speed obtained as the rhythm information of the userls 
veiees- speech by the veiee -speech recognizing unit 1 and the language processing result from the 
language processing unit 2 are supplied to the dialog processor 31 of the dialog managing unit 3. 
The dialog processor 31 supplies the language processing result from the language processing 
unit 2 to the extractor 32, causing it to extract a predetermined keyword contained in the 
language processing result, and stores the extracted keyword as part of a dialog history in the 
dialog history storage 33. Thereafter, the dialog processor 31 executes the profile collection 
processing described below. 

Herein, therefore, the profile collection processing is executed whenever the user utt e rs a 
speee hspeaks . However, the profile collection processing may be executed after several sp ee ch e s 
words are exchanged between the user and the computer, or at intervals of a certain period of 
time. 

Fig. 9 is a flowchart showing a first embodiment of the profile collection processing. 

In the embodiment of Fig. 9, the dialog processor 31 first, in step SI, focuses an attention 
on a certain one of the words registered in the dialog history by referring to the dialog history 
stored in the dialog history storage 33, and calculates the number of times of appearances (i.e., 
appearance frequency) of the target word. Further, in step SI, the dialog processor 31 determines 
whether the number of times of appearances of the target word is not less than a predetermined 
threshold. If it is determined that the number of times of appearances of the target word is less 
than the predetermined threshold, the dialog processor 31 returns to step SI after waiting for until 
the user utt e rs a next spooch speaks again . 

On the other hand, if it is determined in step SI that the number of times of app e aranc e s 
ef -that the target word a ppears is not less than the predetermined threshold, the processing flow 
goes to step S2 where the dialog processor 31 supplies the target word to the extractor 32 for 
acquiring a broader term of the target word. 
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More specifically, upon receiving the target word from the dialog managing unit 3, the 
extractor 32 recognizes a broader term of the target word by referring to the thesaurus stored in 
the concept information database 36, and supplies the recognized broader term to the dialog 
processor 31. In this way, the dialog processor 31 acquires in step S2 the broader term of the 
target word supplied from the extractor 32. 

Subsequently, the processing flow goes to step S3 where the dialog processor 31 supplies, 
to the user information management unit 4 (the recording/reproducing unit 41 in Fig. 6), profile 
control information for instructing the broader term of the target word to be reflected in the user 
profile. The dialog processor 31 then returns to step SI after waiting for until the user utters a 
next speech. 

In this case, the recording/reproducing unit 41 of the user information management unit 4 
(Fig. 6) refers to the user profile (Fig. 7B) in the profile database 42 and increments by one the 
numb e r of tim e s for each time the interest information corresponding corresponds to the broader 
term indicated by the profile control information from the dialog processor 31. 

Then, the dialog processor 31 instructs the recording/reproducing unit 41 to read out the 
profile management information (Fig. 7A) in the profile database 42, for thereby acquiring a 
threshold with respect to the interest information for which the number of times has been 
incremented. Further, the dialog processor 31 compares the threshold acquired as described 
above (hereinafter referred to also as the acquired threshold) with the number of times having 
been incremented (hereinafter referred to also as the incremented number of times), and 
determines which one of the acquired threshold and the incremented number of times is larger. 
Stated otherwise, the dialog processor 31 instructs the recording/reproducing unit 41 to read the 
incremented number of time -times out of the user profile in the profile database 42, and 
determines whether the read-out incremented number of time -times is not less than the acquired 
threshold. If the incremented number of time is not less than the acquired threshold, the dialog 
processor 31 controls the recording/reproducing unit 41 such that, when an interest flag for the 
interest information corresponding to the incremented number of time -times is at a level of "0", 
the interest flag is set to "1". 

Accordingly, for example, when the user is interested in movies and has spoken many 
words such as the cast names, director names, titles and the location sites of the movies, the 
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interest flag for the interest information corresponding to "movie", which is a broader term of 
those words, is set to "1". 

Note that the profile collection processing of Fig. 9 is performed by employing, as target 
words, all of the words registered in the dialog history which is stored in the dialog history 
storage 33. 

Fig. 10 is a flowchart showing a second embodiment of the profile collection processing. 

In the embodiment of Fig. 10, the dialog processor 31 first, in step SI 1, refers to the 
dialog history stored in the dialog history storage 33 and controls the extractor 32 so as to acquire 
a broader term of each word registered in the dialog history. 

Then, the processing flow goes to step S12 where the dialog processor 31 focuses an 
attention on a certain one of the acquired broader terms and calculates the number of times of 
appearances (i.e., appearance frequency) of the target broader time term . Further, in step SI 2, the 
dialog processor 31 determines whether the number of times of appearances of the target broader 
term is not less than a predetermined threshold. If it is determined that the number of times of 
appearances of the target broader term is less than the predetermined threshold, the dialog 
processor 3 1 returns to step SI after waiting for until the user utters a n e xt sp ee ch speaks again . 

On the other hand, if it is determined in step S12 that the number of times ef-or 
appearances of the target broader term is not less than the predetermined threshold, the 
processing flow goes to step S^S 13 where the dialog processor 31 supplies, to the user 
information management unit 4 (the recording/reproducing unit 41 in Fig. 6), profile control 
information for instructing the target broader term to be reflected in the user profile. The dialog 
processor 31 then returns to step Sll after waiting fe^until the user utters a n e xt sp ee ch speaks 
again . 

In this case, the dialog processor 31 executes similar processing as described above in 
connection with the first embodiment of Fig. 9. As a result, for example, when the user is 
interested in movies and has spoken many words belonging to a broader term "movie", such as 
the cast names, director names, titles and the location sites of the movies, the interest flag for the 
interest information corresponding to "movie" is set to "1". 
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Note that the profile collection processing of Fig. 10 is performed by employing, as target 
broader terms, the broader terms of all the words registered in the dialog history which is stored 
in the dialog history storage 33. 

Also, while words are registered in the dialog history in the embodiment of Fig. 10, 
broader terms of words may be registered in the dialog history. 

Fig. 1 1 is a flowchart showing a third embodiment of the profile collection processing. 

In the embodiment of Fig. 1 1, the dialog processor 31 first, in step S21, performs simple 
determination as to whether the topic of a dialog between the user and the computer has shifted. 

The simple (rough) determination as to whether the topic has shifted can be performed, 
for example, as follows. 

First, the simple determination as to whether the topic has shifted can be performed based 
on the speaking speed supplied from the veiee- speech recognizing unit 1 . In general, when the 
topic is shifted, the speaking speed tends to slow down and then increases to a higher pitch. If 
the speaking speed has changed in such a manner, it can be determined that the topic has shifted. 

Secondly, when shifting the topic, specific wordings, such as "Well, let's change the 
subject" and "Is there anything else?", are often used. If such a wording is contained in the 
language processing result from the language processing unit 2, it can also be determined that the 
topic has shifted. 

Thirdly, when the topic is shifted, similarity or correlation in the meaning between words 
(vocabularies), which are contained in both the language processing results outputted from the 
language processing unit 2 before and after the shift of the topic, tends to decrease. Therefore, 
whether the topic has shifted or not can be determined based on such similarity or correlation in 
the meaning between words. 

The similarity or correlation in the meaning between words can be calculated, for 
example, based on the thesaurus stored in the concept information database 36. In other words, 
similarity in the meaning between two words can be calculated, for example, based on a broader 
term in common to the two words using the thesaurus. 

If a result of the simple determination in step S21 shows that the topic is not shifted, the 
dialog processor 31 returns to step S21 after waiting fe^-until the user utters a next sp ee ch or 
speaks again . 
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On the other hand, if it is determined in step S21 that the topic has shifted, the processing 
flow goes to step S22 where the dialog processor 31 performs close determination (i.e., 
determination with higher accuracy than that of the simple determination) as to whether the topic 
of a dialog between the user and the computer has shifted. 

The close determination as to whether the topic has shifted is performed, for example, by 
reviewing the language processing result of a speech uttered from the user while referring to the 
dialog history. 

If it is determined in step S22 that the topic is not shifted, the dialog processor 31 returns 
to step S21 after waiting for until the user utters a next sp ee ch or speaks again . If it is determined 
in step S22 that the topic has shifted, the processing flow goes to step S23. 

While, in the embodiment of Fig. 11, whether the topic has shifted or not is determined 
by carrying out the simple determination and then the close determination, only the close 
determination may be carried out to determine whether the topic has shifted without carrying out 
the simple determination (this is equally applied to the processing of Figs. 12 and 13 described 
later). Note that the simple determination is inferior in the determination accuracy, but requires 
processing with a light load, whereas the close determination is superior in the determination 
accuracy, but requires processing with a heavy load. In the case of carrying out the close 
determination alone, therefore, redundancy in the determination accuracy, but the close 
determination imposing a heavy load must be performed each time the user utters a speech. On 
the other hand, in the case of carrying out the simple determination and then the close 
determination, the processing is somewhat redundant, but the close determination imposing a 
heavy load is just required to be performed only when it is determined by the simple 
determination that the topic has shifted. 

In step S23, the dialog processor 31 calculates the number of speeches uttered by the user 
on the topic before shift, while referring to the dialog history, and then goes to step S24. 

Assume now that the following conversation, for example, is exchanged between the user 

and the interactive user-profile collecting system: 

1 : sys>How do you spend the weekend? 

2: usr>Last week, I saw the film "A" at the movie theater 000. 

3 : sys>Whom do you like in the cast? 

4 : usr> Actress xxxx . 
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5: sys>Recently, did you go to any other movie? 

6: usr>Say, I saw the film "B", too, two weeks ago. 

7: sys>Really? 

8: usr>Well, let's change the subject. 

9: sys>What subject? 

1 0: usr> I want to know about "CC" . (3 ) 

In this conversation, the dialog processor 31 determines that the topic has shifted at the 
eighth speech when "8: usr> Well, let's change the subject." was uttered by the user. 

In the above conversation (3), "sys>" represents a sp ee ch dialog (synthesized sounds) 
issued by the computer (interactive user-profile collecting system), and :usr>" represents a speed 
speech uttered by the user. The numeral before "sys>" or "usr>" indicates the number at which 
the -of times speech has been issuedi etHittered or spoken by the user and/or the processor . 

Also, in the above conversation (3), the topic is shifted at the eighth speech by the user, as 
mentioned above, and the topic before the_shift covers from the first speech by the system to the 
seventh speech by the system. During this period, the user utters three speeches or speaks three 
times , i.e., the second, fourth and sixth ones. In this case, therefore, the number of times ef 
sp ee ch e s on that the topic was mentioned before th^shift is -was calculat e d to b e three. 

Incidentally, the topic cov e rs covered from the first speech to the seventh speech in the 
above conversation (3) is "movie". 

In step S24, the dialog processor 31 determines whether the number of times of sp ee ch e s 
on th e atopic is spoken before being shi& shifted is not less than a predetermined threshold. If it 
is determined that the number of times of sp ee ch e s a topic is spoken is less than the 
predetermined threshold, i.e., if the user does not utter speeches or speak on the topic very many 
times before the shift in a not so larg e number of tim e s and hence the user seems to be not so 
interested in the topic before the_shift, the dialog processor 31 returns to step S21 after waiting 
fef-until the user utters a n e xt sp ee ch or speaks again . 

On the other hand, if it is determined in step S24 that the number of times of sp ee ch e s a 
topic is spoken is not less than the predetermined threshold, i.e., if the user uttor spe e ch e s speaks 
on the topic several times before the_shift in a larg e numb e r of tim e s and hence the user seems to 
be so interested in the topic before the shift, the processing flow goes to step S25 where the 
dialog processor 31 supplies, to the user information management unit 4 (the 
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recording/reproducing unit 41 in Fig. 6), profile control information for instructing the topic 
before th^shift to be reflected in the user profile. The dialog processor 31 then returns to step 
S21 after waiting feiHintil the user utters a next sp ee ch or speaks again . 

In this case, the recording/reproducing unit 41 of the user information management unit 4 
(Fig. 6) refers to the user profile (Fig. 7B) in the profile database 42 and increments by one the 
number of tim e s for each time the interest information corr e sponding corresponds to the topic 
indicated by the profile control information from the dialog processor 3 1 . 

Then, the dialog processor 31 instructs the recording/reproducing unit 41 to read out the 
profile management information (Fig. 7A) in the profile database 42, for thereby acquiring a 
threshold with respect to the interest information for which the number of times has been 
incremented. Further, the dialog processor 31 compares the threshold acquired as described 
above (i.e., the acquired threshold) with the number of times having been incremented (i.e., the 
incremented number of times), and determines which one of the acquired threshold and the 
incremented number of times is larger. Stated otherwise, the dialog processor 31 instructs the 
recording/reproducing unit 41 to read the incremented number of time- times out of the user 
profile in the profile database 42, and determines whether the read-out incremented number of 
time -times is not less than the acquired threshold. If the incremented number of time -times is not 
less than the acquired threshold, the dialog processor 31 controls the recording/reproducing unit 
41 such that, when an interest flag for the interest information corresponding to the incremented 
number of time- times is at a level of "0", the interest flag is set to "1". 

Accordingly, for example, when the user is interested in movies and has uttered many 
sp ee ch e s or spoke many times on the topic "movie" before chang e of changing the topic, the 
interest flag for the interest information corresponding to the topic "movie" is set to "1". 

While the embodiment of Fig. 1 1 has been described as calculating the number of times 
of th e sp ee ch e s by the user speaks on the topic before the shift, the number of times of th e 
sp ee ch e s the topic is mentioned or spoken may be obtained by calculating not only the sp ee ch e s 
by -number of times the user speaks , but also by the number of times speeches by the system 
speaks . 

Fig. 12 is a flowchart showing a fourth embodiment of the profile collection processing. 
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In the embodiment of Fig. 12, the dialog processor 31 executes determination processing 
in steps S31 and S32 in the same manners as those in steps S21 and S22 in Fig. 11, respectively. 

Then, if it is determined in step S32 that the topic has shifted, the processing flow goes to 
step S33 where the dialog processor 31 calculates a total of th e time during which the user has 
utt e r e d sp ee ch e s spoken and the time during which the system has issu e d sp ee ch e s gpokgn, by 
referring to the dialog history. Thereafter, the processing flow goes to step S34. 

More specifically, assuming, for example, that the above-described conversation (3) has 
been exchanged between the user and the system, the dialog processor 31 determines that the 
topic has shifted at the eighth speech "8: usr> Well, lets change the subject." uttered by the user. 
In this case, a period of time from the time at which the first speech by the system has started to 
the time at which the seventh speech by the system has ended is calculated in step S33 as a total 
time of the sp ee ch e s dialog on the topic before shift. 

Since the dialog history registers therein the time at which the user has uttered each 
speech, etc. as described above, the speech time can be calculated by referring to such time data 
stored in the dialog history. 

In step S34, the dialog processor 31 determines whether the speech time on the topic 
before shift is not less than a predetermined threshold. If it is determined that the speech time is 
less than the predetermined threshold, i.e., if a conversation is not exchanged between the user 
and the system for a not so long time on the topic before a_shift and hence the user seems to be 
not se-be interested in the topic before the shift, the dialog processor 31 returns to step S31 after 
waiting for until-the user utters a n e xt sp ee ch to speak again . 

On the other hand, if it is determined in step S34 that the speech time is not less than the 
predetermined threshold, i.e., if a conversation is exchanged between the user and the system for 
a relatively long time on the topic before shi^ shifting and hence the user seems to be so 
interested in the topic before th^shift, the processing flow goes to step S3 5 where the dialog 
processor 31 supplies, to the user information management unit 4 (the recording/reproducing unit 
41 in Fig. 6), profile control information for instructing the topic before shift to be reflected in 
the user profile. The dialog processor 31 then returns to step S31 after waiting for until the user 
utters a next speech. 
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In this case, the dialog processor 31 executes similar processing as described above in 
connection with the third embodiment of Fig. 11. As a result, for example, when the user is 
interested in movies and a conversation regarding movies, i.e., such points as the cast names, 
director names, titles and the location sites of the movies, is exchanged between the user and the 
system for a relatively long time, the interest flag for the interest information corresponding to 
"movie" is set to "1". 

While the embodiment of Fig. 12 has been described as calculating a total time of the 
sp ee ches dialog or speech by both the user and the system on the topic before the shift, the 
speech time may be obtained by calculating only a time of the speech e s speech by the user or a 
time of the sp ee ch e s speech by the system. 

Fig. 13 is a flowchart showing a fifth embodiment of the profile collection processing. 

In the embodiment of Fig. 13, the dialog processor 31 executes determination processing 
in steps S41 and S42 in the same manners as those in steps S21 and S22 in Fig. 11, respectively. 

Then, if it is determined in step S42 that the topic has shifted, the processing flow goes to 
step S43 where the dialog processor 31 calculates the number of times at which a conversation 
has been exchanged on the topic after shift (i.e., the number of times of appearances of the topic 
after shift during the dialog) by referring to the dialog history. Thereafter, the processing flow 
goes to step S44. 

In step S44, the dialog processor 31 determines whether the number of times of 
appearances of the topic after a shift is not less than a predetermined threshold. If it is 
determined that the number of times of appearances of the topic after the shift is less than the 
predetermined threshold, i.e., if a conversation is not exchanged between the user and the system 
in a not so large number of times on the topic after the_shift and hence the user seems to be not so 
interested in the topic after shift, the dialog processor 31 returns to step S41 after waiting for 
until the user utt e rs a n e xt sp ee c h speaks again . 

On the other hand, if it is determined in step S44 that the number of times of appearances 
of the topic after the_shift is not less than the predetermined threshold, i.e., if a conversation is 
exchanged between the user and the system in a relatively large number of times on the topic 
after shift and hence the user seems to be so interested in the topic after th^shift, the processing 
flow goes to step S45 where the dialog processor 31 supplies, to the user information 
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management unit 4 (the recording/reproducing unit 41 in Fig. 6), profile control information for 
instructing the topic after shift to be reflected in the user profile. The dialog processor 31 then 
returns to step S41 after waiting for until the user utters a next speech. 

In this case, the dialog processor 31 executes similar processing as described above in 
connection with the third embodiment of Fig. 11. As a result, for example, when the user is 
interested in movies and a conversation regarding movies, i.e., such points as the cast names, 
director names, titles and the location sites of the movies, is exchanged between the user and the 
system in a relatively large number of times, the interest flag for the interest information 
corresponding to "movie" is set to 111". 

More specifically, assuming now that a conversation is exchanged between the user and 
the system and the topic has shifted in the sequence of, e.g., a topic regarding movies, a topic 
regarding music, a request for job, a topic regarding movies, a topic regarding books, a topic 
regarding movies, and a topic regarding movies, the number of times of appearances of the topic 
"movie" is calculated to be four at a point in time when the topic has shifted to the last one 
regarding movies. Then, assuming that the predetermined threshold used in step S44 is four, the 
number of times for the interest information corresponding to "movie" in the user profile (Fig. 
7B) is incremented by one after the topic has shifted to the last one regarding movies. Further, if 
the number of times having been incremented (i.e., the incremented number of times) is not less 
than the threshold for the interest information corresponding to "movie" in the profile 
management information (Fig. 7A) (e.g., four in the example of Fig. 7 A), the interest flag for the 
interest information corresponding to the topic "movie" in the user profile is set to "1". 

With the profile collection processing, as described above, while the user is exchanging 
some conversation with the system, user information regarding interests and tastes of the user is 
collected and reflected in a user profile. Therefore, the user profile reflecting the interests and 
tastes of the user can be easily prepared without imposing any burden on the user. Further, the 
interests and tastes of the user can be recognized by referring to the user profile. Consequently, 
for example, when searching information provided from WWW servers, those ones among 
search results from search engines, which are in match with the user profile, can be provided to 
the user so that the user may easily obtain desired information. 



26 



Appl. No. 09/765,962 

Reply to Office Action of April 9, 2003 

Also, with the profile collection processing, since the interests and tastes of the user are 
collected while the user is exchanging some conversation with the system, other interests and 
tastes than being perceived by the user may be sometimes reflected in the user profile. 

It is to be noted that, in the present invention, the processing steps executing the program 
necessary for operating the computer to carry out various kinds of processing are not always 
required to run in time series following the sequences described in the flowcharts, but they may 
be run in parallel or individually (e.g., with parallel processing or object-oriented processing). 

Also, the program may be executed by one computer or a plurality of computers in a 
distributed manner. Further, the program may be executed by a computer at a remote location 
after being transferred to there. 

Moreover, a sequence of the above-described processing steps may be executed by 
dedicated hardware rather than using software. 

While, in the embodiment described above, a response sentence is outputted from the 
system in the form of synthesized sounds, the response sentence may be displayed on a display 
unit. 

In the embodiment described above, interest flags each having one bit is provided in the 
user profile (Fig. 7B), and when the number of times is increased to a value not less than a 
threshold defined in the profile management information (Fig. 7A), the corresponding interest 
flag is set from "0" to "1". However, the interest flag may have three or more different values. 
This case enables a value of the interest flag to reflect a degree of user interest on the 
corresponding interest information by incrementing the interest flag one by one, for example, 
whenever the number of times reaches a value once, twice and so on as large as the threshold 
defined in the profile management information (Fig. 7A). 

Additionally, the user information regarding interests and tastes of the user is collected in 
the above-described embodiment, but the present invention is also applicable to the case of 
collecting other kinds of user information. 

According to the information processing apparatus, the information processing method, 
and the storage medium of the present invention, veiees -speech of a user afeis recognized and a 
dialog sentence for exchanging a dialog with the user is created based on a result of the voic e 
speech recognition. Also, user information is collected based on the veiee -speech recognition 
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result. Therefore, the user information regarding, e.g., interests and tastes of 
easily collected. 
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ABSTRACT OF THE DISCLOSURE 

Voic e s Speech of a user areis recognized in a veiee -speech recognizing unit. Based on a 
result of the veiee -speech recognition, a language processing unit, a dialog managing unit and a 
response generating unit cooperatively create a dialog sentence for exchanging a dialog with the 
user. Also, based on the veiee- speech recognition result, the dialog managing unit collects user 
information regarding, e.g., interests and tastes of the user. Therefore, the user information 
regarding, erg^-the interests and tastes of the user can be easily collected. 
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